CCG-based Models for Statistical Machine Translation

نویسنده

  • Michael Auli
چکیده

The arguably best performing statistical machine translation systems are based on context-free formalisms or weakly equivalent ones. These models usually use a synchronous version of a context-free grammar (SCFG) which we argue is too rigid for the highly ambiguous task of human language translation. This is exacerbated by the fact that the imperfect methods available for aligning parallel texts make extracting an efficient grammar very hard. As a result, the context-free grammars extracted are usually very large in size after having already been restricted through a variety of constraints. We propose to use Combinatorial Categorial Grammar (CCG) for machine translation models. CCG is a lexicalized, mildly-context-sensitive formalism which is very well suited to capture long-distance dependencies that are not addressed very well by most current models. We believe that CCG is very well suited for the task of machine translation due to its ability to represent non-constituents in a syntactic way which frequently occur in parallel texts as well as its high derivational flexibility. This allows us to use some of the advantages of non-syntactic phrase-based approaches within a syntactic framework such as a relatively small grammar size compared to context-freebased machine translation grammars. A number of models leveraging the advantages of CCG are possible, however, our principal goal is to develop a string-to-tree based model which projects CCG on the target side of a synchronous grammar. We intend to apply the vast progress made in monolingual CCG parsing to machine translation. Additionally, we propose to extend CCG to a synchronous grammar (SCCG) as it has been done for other related formalisms such as tree adjoining grammars. We hope that a SCCG may provide similar derivational flexibility to monolingual CCG which may result in a better model for translational equivalence.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CCG Supertags in Factored Statistical Machine Translation

Combinatorial Categorial Grammar (CCG) supertags present phrase-based machine translation with an opportunity to access rich syntactic information at a word level. The challenge is incorporating this information into the translation process. Factored translation models allow the inclusion of supertags as a factor in the source or target language. We show that this results in an improvement in t...

متن کامل

Extending CCG-based Syntactic Constraints in Hierarchical Phrase-Based SMT

In this paper, we describe two approaches to extending syntactic constraints in the Hierarchical Phrase-Based (HPB) Statistical Machine Translation (SMT) model using Combinatory Categorial Grammar (CCG). These extensions target the limitations of previous syntax-augmented HPB SMT systems which limit the coverage of the syntactic constraints applied. We present experiments on Arabic–English and ...

متن کامل

A CCG-based Quality Estimation Metric for Statistical Machine Translation

We describe a metric for estimating the quality of Statistical Machine Translation (SMT) output based on syntactic features extracted using Combinatory Categorial Grammar (CCG). CCG has been demonstrated to be better suited to deal with SMT texts than context free phrase structure grammar formalisms. We use CCG features to estimate the grammaticality of the translations by dividing them into ma...

متن کامل

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009